home *** CD-ROM | disk | FTP | other *** search
Wrap
rrrreeeeggggeeeexxxxpppp((((5555)))) rrrreeeeggggeeeexxxxpppp((((5555)))) NNNNAAAAMMMMEEEE _rrrr_eeee_gggg_eeee_xxxx_pppp: _cccc_oooo_mmmm_pppp_iiii_llll_eeee, _ssss_tttt_eeee_pppp, _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee - regular expression compile and match routines SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS _####_dddd_eeee_ffff_iiii_nnnn_eeee _IIII_NNNN_IIII_TTTT _d_e_c_l_a_r_a_t_i_o_n_s _####_dddd_eeee_ffff_iiii_nnnn_eeee _GGGG_EEEE_TTTT_CCCC_((((_vvvv_oooo_iiii_dddd_)))) _g_e_t_c _c_o_d_e _####_dddd_eeee_ffff_iiii_nnnn_eeee _PPPP_EEEE_EEEE_KKKK_CCCC_((((_vvvv_oooo_iiii_dddd_)))) _p_e_e_k_c _c_o_d_e _####_dddd_eeee_ffff_iiii_nnnn_eeee _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC_((((_vvvv_oooo_iiii_dddd_)))) _u_n_g_e_t_c _c_o_d_e _####_dddd_eeee_ffff_iiii_nnnn_eeee _RRRR_EEEE_TTTT_UUUU_RRRR_NNNN_((((_p_t_r_)))) _r_e_t_u_r_n _c_o_d_e _####_dddd_eeee_ffff_iiii_nnnn_eeee _EEEE_RRRR_RRRR_OOOO_RRRR_((((_v_a_l_)))) _e_r_r_o_r _c_o_d_e _####_iiii_nnnn_cccc_llll_uuuu_dddd_eeee _<<<<_rrrr_eeee_gggg_eeee_xxxx_pppp_...._hhhh_>>>> _cccc_hhhh_aaaa_rrrr _****_cccc_oooo_mmmm_pppp_iiii_llll_eeee_((((_cccc_hhhh_aaaa_rrrr _****_iiii_nnnn_ssss_tttt_rrrr_iiii_nnnn_gggg_,,,, _cccc_hhhh_aaaa_rrrr _****_eeee_xxxx_pppp_bbbb_uuuu_ffff_,,,, _cccc_hhhh_aaaa_rrrr _****_eeee_nnnn_dddd_bbbb_uuuu_ffff_,,,, _iiii_nnnn_tttt _eeee_oooo_ffff_))))_;;;; _iiii_nnnn_tttt _ssss_tttt_eeee_pppp_((((_cccc_hhhh_aaaa_rrrr _****_ssss_tttt_rrrr_iiii_nnnn_gggg_,,,, _cccc_hhhh_aaaa_rrrr _****_eeee_xxxx_pppp_bbbb_uuuu_ffff_))))_;;;; _iiii_nnnn_tttt _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee_((((_cccc_hhhh_aaaa_rrrr _****_ssss_tttt_rrrr_iiii_nnnn_gggg_,,,, _cccc_hhhh_aaaa_rrrr _****_eeee_xxxx_pppp_bbbb_uuuu_ffff_))))_;;;; _eeee_xxxx_tttt_eeee_rrrr_nnnn _cccc_hhhh_aaaa_rrrr _****_llll_oooo_cccc_1111_,,,, _****_llll_oooo_cccc_2222_,,,, _****_llll_oooo_cccc_ssss_;;;; DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN These functions are general purpose regular expression matching routines to be used in programs that perform regular expression matching. These functions are defined by the _rrrr_eeee_gggg_eeee_xxxx_pppp_...._hhhh header file. The functions _ssss_tttt_eeee_pppp and _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee do pattern matching given a character string and a compiled regular expression as input. The function _cccc_oooo_mmmm_pppp_iiii_llll_eeee takes as input a regular expression as defined below and produces a compiled expression that can be used with _ssss_tttt_eeee_pppp or _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee. A regular expression specifies a set of character strings. A member of this set of strings is said to be matched by the regular expression. Some characters have special meaning when used in a regular expression; other characters stand for themselves. The regular expressions available for use with the regexp functions are constructed as follows: _E_x_p_r_e_s_s_i_o_n _M_e_a_n_i_n_g _c the character _c where _c is not a special character. _\\\\_c the character _c where _c is any character, except a digit in the range _1111_----_9999. _^^^^ the beginning of the line being compared. _$$$$ the end of the line being compared. PPPPaaaaggggeeee 1111 rrrreeeeggggeeeexxxxpppp((((5555)))) rrrreeeeggggeeeexxxxpppp((((5555)))) _.... any character in the input. _[[[[_s_]]]] any character in the set _s, where _s is a sequence of characters and/or a range of characters, for example, _[[[[_c_----_c_]]]]. _[[[[_^^^^_s_]]]] any character not in the set _s, where _s is defined as above. _r_**** zero or more successive occurrences of the regular expression _r. The longest leftmost match is chosen. _r_x the occurrence of regular expression _r followed by the occurrence of regular expression _x. (Concatenation) _r_\\\\_{{{{_m_,,,,_n_\\\\_}}}} any number of _m through _n successive occurrences of the regular expression _r. The regular expression _r_\\\\_{{{{_m_\\\\_}}}} matches exactly _m occurrences; _r_\\\\_{{{{_m_,,,,_\\\\_}}}} matches at least _m occurrences. _\\\\_((((_r_\\\\_)))) the regular expression _r. When _\\\\_n (where _n is a number greater than zero) appears in a constructed regular expression, it stands for the regular expression _x where _x is the _n_t_h regular expression enclosed in _\\\\_(((( and _\\\\_)))) that appeared earlier in the constructed regular expression. For example, _\\\\_((((_r_\\\\_))))_x_\\\\_((((_y_\\\\_))))_z_\\\\_2222 is the concatenation of regular expressions _r_x_y_z_y. Characters that have special meaning except when they appear within square brackets (_[[[[_]]]]) or are preceded by _\\\\ are: _...., _****, _[[[[, _\\\\. Other special characters, such as _$$$$ have special meaning in more restricted contexts. The character _^^^^ at the beginning of an expression permits a successful match only immediately after a newline, and the character _$$$$ at the end of an expression requires a trailing newline. Two characters have special meaning only when used within square brackets. The character _---- denotes a range, _[[[[_c_----_c_]]]], unless it is just after the open bracket or before the closing bracket, _[[[[_----_c_]]]] or _[[[[_c_----_]]]] in which case it has no special meaning. When used within brackets, the character _^^^^ has the meaning _c_o_m_p_l_e_m_e_n_t _o_f if it immediately follows the open bracket (example: _[[[[_^^^^_c_]]]]); elsewhere between brackets (example: _[[[[_c_^^^^_]]]]) it stands for the ordinary character _^^^^. The special meaning of the _\\\\ operator can be escaped only by preceding it with another _\\\\, for example, _\\\\_\\\\. Programs must have the following five macros declared before the _####_iiii_nnnn_cccc_llll_uuuu_dddd_eeee _rrrr_eeee_gggg_eeee_xxxx_pppp_...._hhhh statement. These macros are used by the _cccc_oooo_mmmm_pppp_iiii_llll_eeee routine. The macros _GGGG_EEEE_TTTT_CCCC, _PPPP_EEEE_EEEE_KKKK_CCCC, and _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC operate on the regular expression given as input to _cccc_oooo_mmmm_pppp_iiii_llll_eeee. PPPPaaaaggggeeee 2222 rrrreeeeggggeeeexxxxpppp((((5555)))) rrrreeeeggggeeeexxxxpppp((((5555)))) _GGGG_EEEE_TTTT_CCCC This macro returns the value of the next character (byte) in the regular expression pattern. Successive calls to _GGGG_EEEE_TTTT_CCCC should return successive characters of the regular expression. _PPPP_EEEE_EEEE_KKKK_CCCC This macro returns the next character (byte) in the regular expression. Immediately successive calls to _PPPP_EEEE_EEEE_KKKK_CCCC should return the same character, which should also be the next character returned by _GGGG_EEEE_TTTT_CCCC. _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC This macro causes the argument _cccc to be returned by the next call to _GGGG_EEEE_TTTT_CCCC and _PPPP_EEEE_EEEE_KKKK_CCCC. No more than one character of pushback is ever needed and this character is guaranteed to be the last character read by _GGGG_EEEE_TTTT_CCCC. The return value of the macro _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC_((((_cccc_)))) is always ignored. _RRRR_EEEE_TTTT_UUUU_RRRR_NNNN_((((_p_t_r_)))) This macro is used on normal exit of the _cccc_oooo_mmmm_pppp_iiii_llll_eeee routine. The value of the argument _p_t_r is a pointer to the character after the last character of the compiled regular expression. This is useful to programs which have memory allocation to manage. _EEEE_RRRR_RRRR_OOOO_RRRR_((((_v_a_l_)))) This macro is the abnormal return from the _cccc_oooo_mmmm_pppp_iiii_llll_eeee routine. The argument _v_a_l is an error number [see ERRORS below for meanings]. This call should never return. The syntax of the _cccc_oooo_mmmm_pppp_iiii_llll_eeee routine is as follows: _cccc_oooo_mmmm_pppp_iiii_llll_eeee_((((_i_n_s_t_r_i_n_g_,,,, _e_x_p_b_u_f_,,,, _e_n_d_b_u_f_,,,, _e_o_f_)))) The first parameter, _i_n_s_t_r_i_n_g, is never used explicitly by the _cccc_oooo_mmmm_pppp_iiii_llll_eeee routine but is useful for programs that pass down different pointers to input characters. It is sometimes used in the _IIII_NNNN_IIII_TTTT declaration (see below). Programs which call functions to input characters or have characters in an external array can pass down a value of _((((_cccc_hhhh_aaaa_rrrr _****_))))_0000 for this parameter. The next parameter, _e_x_p_b_u_f, is a character pointer. It points to the place where the compiled regular expression will be placed. The parameter _e_n_d_b_u_f is one more than the highest address where the compiled regular expression may be placed. If the compiled expression cannot fit in _((((_eeee_nnnn_dddd_bbbb_uuuu_ffff_----_eeee_xxxx_pppp_bbbb_uuuu_ffff_)))) bytes, a call to _EEEE_RRRR_RRRR_OOOO_RRRR_((((_5555_0000_)))) is made. The parameter _e_o_f is the character which marks the end of the regular expression. This character is usually a _////. Each program that includes the _rrrr_eeee_gggg_eeee_xxxx_pppp_...._hhhh header file must have a _####_dddd_eeee_ffff_iiii_nnnn_eeee statement for _IIII_NNNN_IIII_TTTT. It is used for dependent declarations and initializations. Most often it is used to set a register variable to point to the beginning of the regular expression so that this register variable can be used in the declarations for _GGGG_EEEE_TTTT_CCCC, _PPPP_EEEE_EEEE_KKKK_CCCC, and _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC. PPPPaaaaggggeeee 3333 rrrreeeeggggeeeexxxxpppp((((5555)))) rrrreeeeggggeeeexxxxpppp((((5555)))) Otherwise it can be used to declare external variables that might be used by _GGGG_EEEE_TTTT_CCCC, _PPPP_EEEE_EEEE_KKKK_CCCC and _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC. [See EXAMPLE below.] The first parameter to the _ssss_tttt_eeee_pppp and _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee functions is a pointer to a string of characters to be checked for a match. This string should be null terminated. The second parameter, _e_x_p_b_u_f, is the compiled regular expression which was obtained by a call to the function _cccc_oooo_mmmm_pppp_iiii_llll_eeee. The function _ssss_tttt_eeee_pppp returns non-zero if some substring of _s_t_r_i_n_g matches the regular expression in _e_x_p_b_u_f and zero if there is no match. If there is a match, two external character pointers are set as a side effect to the call to _ssss_tttt_eeee_pppp. The variable _llll_oooo_cccc_1111 points to the first character that matched the regular expression; the variable _llll_oooo_cccc_2222 points to the character after the last character that matches the regular expression. Thus if the regular expression matches the entire input string, _llll_oooo_cccc_1111 will point to the first character of _s_t_r_i_n_g and _llll_oooo_cccc_2222 will point to the null at the end of _s_t_r_i_n_g. The function _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee returns non-zero if the initial substring of _s_t_r_i_n_g matches the regular expression in _e_x_p_b_u_f. If there is a match, an external character pointer, _llll_oooo_cccc_2222, is set as a side effect. The variable _llll_oooo_cccc_2222 points to the next character in _s_t_r_i_n_g after the last character that matched. When _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee encounters a _**** or _\\\\_{{{{ _\\\\_}}}} sequence in the regular expression, it will advance its pointer to the string to be matched as far as possible and will recursively call itself trying to match the rest of the string to the rest of the regular expression. As long as there is no match, _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee will back up along the string until it finds a match or reaches the point in the string that initially matched the _**** or _\\\\_{{{{ _\\\\_}}}}. It is sometimes desirable to stop this backing up before the initial point in the string is reached. If the external character pointer _llll_oooo_cccc_ssss is equal to the point in the string at sometime during the backing up process, _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee will break out of the loop that backs up and will return zero. The external variables _cccc_iiii_rrrr_cccc_ffff, _ssss_eeee_dddd, and _nnnn_bbbb_rrrr_aaaa are reserved. DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS The function _cccc_oooo_mmmm_pppp_iiii_llll_eeee uses the macro _RRRR_EEEE_TTTT_UUUU_RRRR_NNNN on success and the macro _EEEE_RRRR_RRRR_OOOO_RRRR on failure (see above). The functions _ssss_tttt_eeee_pppp and _aaaa_dddd_vvvv_aaaa_nnnn_cccc_eeee return non-zero on a successful match and zero if there is no match. Errors are: _1111_1111 range endpoint too large. _1111_6666 bad number. _2222_5555 _\\\\ _d_i_g_i_t out of range. PPPPaaaaggggeeee 4444 rrrreeeeggggeeeexxxxpppp((((5555)))) rrrreeeeggggeeeexxxxpppp((((5555)))) _3333_6666 illegal or missing delimiter. _4444_1111 no remembered search string. _4444_2222 _\\\\_(((( _\\\\_)))) imbalance. _4444_3333 too many _\\\\_((((. _4444_4444 more than 2 numbers given in _\\\\_{{{{ _\\\\_}}}}. _4444_5555 _}}}} expected after _\\\\. _4444_6666 first number exceeds second in _\\\\_{{{{ _\\\\_}}}}. _4444_9999 _[[[[ _]]]] imbalance. _5555_0000 regular expression overflow. EEEEXXXXAAAAMMMMPPPPLLLLEEEE The following is an example of how the regular expression macros and calls might be defined by an application program: _####_dddd_eeee_ffff_iiii_nnnn_eeee _IIII_NNNN_IIII_TTTT _rrrr_eeee_gggg_iiii_ssss_tttt_eeee_rrrr _cccc_hhhh_aaaa_rrrr _****_ssss_pppp _==== _iiii_nnnn_ssss_tttt_rrrr_iiii_nnnn_gggg_;;;; _####_dddd_eeee_ffff_iiii_nnnn_eeee _GGGG_EEEE_TTTT_CCCC _((((_****_ssss_pppp_++++_++++_)))) _####_dddd_eeee_ffff_iiii_nnnn_eeee _PPPP_EEEE_EEEE_KKKK_CCCC _((((_****_ssss_pppp_)))) _####_dddd_eeee_ffff_iiii_nnnn_eeee _UUUU_NNNN_GGGG_EEEE_TTTT_CCCC_((((_cccc_)))) _((((_----_----_ssss_pppp_)))) _####_dddd_eeee_ffff_iiii_nnnn_eeee _RRRR_EEEE_TTTT_UUUU_RRRR_NNNN_((((_****_cccc_)))) _rrrr_eeee_tttt_uuuu_rrrr_nnnn_;;;; _####_dddd_eeee_ffff_iiii_nnnn_eeee _EEEE_RRRR_RRRR_OOOO_RRRR_((((_cccc_)))) _rrrr_eeee_gggg_eeee_rrrr_rrrr _####_iiii_nnnn_cccc_llll_uuuu_dddd_eeee _<<<<_rrrr_eeee_gggg_eeee_xxxx_pppp_...._hhhh_>>>> _.... _.... _.... _((((_vvvv_oooo_iiii_dddd_)))) _cccc_oooo_mmmm_pppp_iiii_llll_eeee_((((_****_aaaa_rrrr_gggg_vvvv_,,,, _eeee_xxxx_pppp_bbbb_uuuu_ffff_,,,, _&&&&_eeee_xxxx_pppp_bbbb_uuuu_ffff_[[[[_EEEE_SSSS_IIII_ZZZZ_EEEE_]]]]_,,,,_''''_\\\\_0000_''''_))))_;;;; _.... _.... _.... _iiii_ffff _((((_ssss_tttt_eeee_pppp_((((_llll_iiii_nnnn_eeee_bbbb_uuuu_ffff_,,,, _eeee_xxxx_pppp_bbbb_uuuu_ffff_))))_)))) _ssss_uuuu_cccc_cccc_eeee_eeee_dddd_;;;; PPPPaaaaggggeeee 5555